Database 2 — Appunti TiTilda

Indice

Transaction

A transaction is an atomic unit of work made by an application.

A transaction starts with a begin-transaction and ends with an end-transaction statement like:

ACID

Transactions supports these properties:

Concurrency

While executing transactions serially (one after the other) guarantees correctness, it severely impacts performance and system efficiency. To maximize resource utilization, a DBMS uses concurrency control to allow the operations of multiple transactions to be interleaved (run in parallel).

An interleaved execution sequence is called a schedule.

Concurrency Issues

If operations are interleaved poorly, it can lead to anomalies (or consistency violations). The primary anomalies are:

Lost Update

This anomaly occurs when two transactions read the same value, and a transaction’s update is overwritten by another concurrent transaction, effectively losing the first update.

  1. r_1(x): T_1 reads x
  2. r_2(x): T_2 reads x
  3. w_1(x): T_1 writes x += 10
  4. w_2(x): T_2 writes x += 5

Dirty Read

This anomaly occurs when a transaction T_2 reads a value updated by another transaction T_1 that has not yet been committed. If T_1 later aborts, T_2 will have read a value that was never actually committed to the database.

  1. w_1(x): T_1 writes x (uncommitted)
  2. r_2(x): T_2 reads the uncommitted x
  3. \text{abort}_1: T_1 aborts
  4. w_2(x): T_2 writes x (based on dirty read)

Non Repeatable Read

This anomaly occurs when a transaction T_1 reads the same value multiple times and get a different result because another transaction T_2 has modified and committed the value between the reads.

  1. r_1(x): T_1 reads x
  2. w_2(x): T_2 update x and commits
  3. r_1(x): T_1 reads x again and gets a different value

Phantom Update

This anomaly occurs when a transaction T_1 reads a set of records that satisfy a certain constraint, then another transaction T_2 modifies the database in such a way that still satisfies the constraint. If T_1 reads the records again, it will violate the constraint.

Constraint: x + y = 10

  1. r_1(x): T_1 reads x
  2. w_2(y+5) & w_2(x-5): T_2 updates the x and y values without violating the constraint
  3. r_1(y): T_1 reads y again and the constraint is violated (the x value is before the update)

Phantom Insert

This anomaly occurs when a transaction T_1 reads a set of records that satisfy a certain condition, then another transaction T_2 inserts new records that satisfy the condition. If T_1 reads the records again, it will get a different result.

  1. r_1(\sigma_{x>5}(R)): T_1 reads the set of records where x > 5
  2. w_2(\text{insert }(x=6)): T_2 inserts a new record with x = 6
  3. r_1(\sigma_{x>5}(R)): T_1 reads the set of records again and gets a different result

Scheduling

The DBMS should be able to to schedule the operations of the transactions in a way that preserves the order of operations inside each transaction.

The operations performed by a transaction T_n on a data item x are:

The transactions can be scheduled in three ways:

Assuming we have n transactions T_1, T_2, \ldots, T_n where each transaction T_i has k_i operations, the number of distinct schedules (that respect the sequence of the operations) is:

N_D = \dfrac{(\sum_{i=1}^n{k_i})!}{\prod_{i=1}^n{k_i!}}

Within this only a fraction is serial:

N_S = n!

Serializable Schedule

From all the possible schedule there are schedules that might encounter an issue due to the concurrency.

We need to identify the serializable schedules that are the one that leave the database in the same state as some serial schedule transaction.

Some assumptions are:

View-Serializable Schedules (VSR)

Two transactions are view-equivalents if they have the same:

  1. operations:
  2. reads operations (read the same data);
  3. final writes operations from the same transactions and the same data.

A schedule is view-serializable if it’s view-equivalents to a serial schedule and by being equivalent to a serial schedule there are no concurrency issue (within the assumptions).

One way to find a serial schedule would be to enumerate all the possible serial schedule (factorial), making it computational intensive and not functional.

To find a solution in a polynomial time we need to add some restriction to find a computable solution.

Conflict Serializable Schedules (CSR)

A conflict occurs if two different transactions perform operations on the same data and at least one of the operations is a write.

Two schedules are conflict-equivalent (CSR) if all the conflicting pairs occurs in the same order. A schedule is conflict-serializable if it is conflict-equivalent to a serial schedule.

CSR is a subset of VSR (CSR \subseteq VSR).

Testing if a schedule is conflict-serializable is done by checking if a conflict-graph is acyclic.

  1. Given a schedule group the operation by the resource used.
  2. Than create an arch between the transactions if there is a conflict

Topologically sorting the graph allows to find the equivalent serial schedule of the graph.

Tips:

\text{Serial} \subseteq \text{CSR} \subseteq \text{VSR} \subseteq \text{All}

Concurrency Control

Since transactions arrive in real-time, the scheduler must dynamically decide the execution order, which might not match the arrival order.

Concurrency control can be implemented with two techniques:

Pessimistic concurrency control

Pessimistic concurrency control assumes that conflicts will occur and takes steps to prevent them. This is done by using locks to control access to resources.

There are two types of locks:

The lock system is implemented with a Lock Table, an hash table, where each resource (table, index, row, or column) has a queue of transactions that are holding or waiting for the lock.

Two-phase Locking (2PL)

This is not enough to avoid anomalies as after releasing a lock another transaction might access the resource, creating anomalies.

Need to implement the Two-phase Locking (2PL) that separate the locking strategy into three phases.

  1. Growing phase: the transaction acquires locks without releasing any.
  2. Plateau: the transaction has obtained all the locks it needs and is neither acquiring nor releasing any locks.
  3. Shrinking phase: the transaction releases locks without obtaining any new ones.

After the last phase the transaction must end with a commit or an abort.

This scheduling strategy generate a new serializable class called 2PL that is a subset of CSR.

This class avoid anomalies related to synchronization, removing the a-posteriori hypothesis.

\text{Serial} \subseteq \text{2PL} \subseteq \text{CSR} \subseteq \text{VSR} \subseteq \text{All}

Strict Two-phase Locking (Strict 2PL)

To avoid anomalies related to abort (dirty reads) we need to implement the Strict 2PL that release the locks only after a end-transaction statement.

This introduce long duration lock that will decrease performance, but remove the commit-only hypothesis.

\text{Serial} \subseteq \text{Strict 2PL} \subseteq \text{2PL} \subseteq \text{CSR} \subseteq \text{VSR} \subseteq \text{All}

Predicate Locking

To avoid phantom anomalies, where a range query returns different results when re-executed, the locking must be extended to future data.

This is done by introducing a predicate lock that lock an access path defined by the WHERE clause, preventing other transactions from inserting, modifying or deleting data that would satisfy the predicate.

SQL

SQL introduce some isolation level for read operations:

A greater isolation level reduce the amount of anomalies, but introduce delays and deadlocks/starvation.

To avoid waiting problems you could some kill mechanism:

Deadlock Detection

To detect deacklock in a single machine you can look at the wait-graph, where each transaction is a node and there is an arch from T_i to T_j if T_i is waiting for a resource held by T_j.

In a distributed system you need to use a distributed deadlock detection algorithm like the Obermarck Algorithm. This algorithm is based on the idea of communicating dependencies between nodes. After some iteration a node can spot a cycle in the dependencies graph, indicating a deadlock.

A dependency is communicated if:

Node A: N_B \rightarrow T_2 \rightarrow T_3 \rightarrow T_1 \rightarrow N_B

Node B: N_A \rightarrow T_1 \rightarrow T_4 \rightarrow T_2 \rightarrow N_A

Node A send to Node B: T_2 \rightarrow T_1 (sends only distributed dependencies)

Node B now can reconstruct the cycle T_1 \rightarrow T_2 \rightarrow T_1

The amount of conflict is O(\frac{1}{n}) and the probability of a deadlock is O(\frac{1}{n^2}), where n is the amount of records in a file. Longer transaction have am higher probability of conflict.

To avoid deadlock:

Request Free ISL IXL SL SIXL XL
ISL Y Y Y Y Y N
IXL Y Y Y N N N
SL Y Y N Y N N
SIXL Y Y N N N N
XL Y N N N N N

Optimistic concurrency control

Another way to manage concurrency is to use an optimistic approach that assumes that conflicts are rare and allows transactions to execute without acquiring locks. This is done by assigning timestamps to transactions and using these timestamps to determine the order of execution.

In a distributed system there is no global time, so each transaction is assigned a timestamp (TS) when it starts, formed from the id of the event and the id of the machine (event-id.machine-id).

The decisions are made based on the age of a transaction.

The system track the Read Timestamp (RTM) and the Write Timestamp (WTM) for each piece of data.

After get killed the transaction restart.

TS class is a subset of CSR, but it will discard serializable transactions.

The basic TS consider only committed transactions. To fix this the transaction should wait for the commit of the transaction that wrote the data.

(\text{Serial} \subseteq \text{Strict 2PL}) \nsubseteq \text{TS}_\text{mono} \subseteq \text{CSR} \subseteq \text{VSR} \subseteq \text{All}

Thomas Write Rule

To reduce the amount of kills it’s possible to use the Thomas Write Rule that:

With this rule the TS class is no more a subset of CSR nor VSR (as w_2(x), w_1(x) has different final writes, as w_1(x) is skipped).

\text{TS}_\text{mono} \subseteq \text{TS}_\text{Thomas} \nsubseteq \text{VSR}

Multiversion Concurrency Control (MVCC)

MVCC allows multiple versions of a data item to exist, enabling readers to access the last committed version without being blocked by writers. Each transaction sees a consistent snapshot of the database at a specific point in time.

In MVCC, when a transaction wants to update a data item, it creates a new version rather than overwriting the existing one. This way, readers can continue to access the old version while the writer works on the new one.

In a theoretical approach, a write operation is killed if TS < RTM, and if TS < WTM, the branches are shifted to write old data. This allows unordered writes.

In a practical approach, the write operation are killed if TS < WTM, to avoid a shift operation.

\text{TS}_\text{mono} \subseteq \text{TS}_\text{multi} \nsubseteq \text{VSR}

Snapshot Isolation

Snapshot Isolation is another isolation level of the DBMS.

This level only use the write timestamp and every transaction read a version consistent with its timestamp.

Triggers

Triggers uses the Event-Condition-Action (ECA), an action A is fired if a condition C is true of an event E:

A rule engine is responsible to monitor the events and execute the action when the condition is true.

CREATE TRIGGER trigger_name
{BEFORE | AFTER} 
{INSERT | UPDATE [of column_name] | DELETE} on table_name
[REFERENCING {OLD | NEW} [TABLE] [AS] reference_name]
[FOR EACH | FOR EACH {ROW | STATEMENT}]
[WHEN (condition)]
BEGIN
    SQL_statements;
END;

Trigger Execution Mode

Triggers can execute either before or after the triggering event:

Trigger Granularity

Triggers can be defined at different levels of granularity:

Trigger Transition Variables

Triggers provide access to transition variables that represent the state of data before and after the event:

For row-level triggers, these variables refer to the specific row being processed. For statement-level triggers, they represent the entire set of affected rows as tables.

Trigger Lifecycle

Triggers follow a specific execution order to ensure proper sequencing:

  1. BEFORE statement trigger
  2. BEFORE row trigger
  3. Execute the original event and apply constraints.
  4. AFTER row trigger
  5. AFTER statement trigger

Note: Actions within triggers may trigger additional events, potentially causing cascading effects. Some DBMS restrict modifications to the same table being monitored to prevent infinite loops or inconsistencies.

Conditional Evaluation

A trigger can include a WHEN clause to specify a condition that must be met for the trigger to execute. This allows for more granular control over when the trigger’s action is performed.

Physical Database

Each query goes through multiple components before accessing memory:

A query can often be executed using multiple methods (execution plans). The DBMS must enumerate them, estimate their costs, and choose the most efficient one.

Tables are stored in files. Each file is divided into blocks in secondary memory, but the DBMS interacts with pages in main memory. The page size is defined by the OS. For simplification, we assume the block size is equal to the page size.

Operations involve moving blocks between secondary memory and main memory. Since disk I/O is the system bottleneck, optimization focuses on minimizing the time spent reading/writing to secondary memory.

The cost of a query is measured by the number of block transfers (I/O operations) required.

The DBMS manages file organization directly, often bypassing OS primitives to remove overhead.

Data Structures

The DBMS provides access methods to manipulate physical data structures.

A table consists of multiple tuples (rows). Each tuple contains multiple fields and may be of variable size. Tuples are stored in files organized into blocks.

A typical block structure:

packet
0-3: "Block Header"
4-7: "Page Header"
8-13: "Page Dictionary"
14-15: "Unused"
16-21: "Tuples"
22-23: "Checksum"
24-27: "Block Trailer"
28-31: "Page Trailer"

Where:

A block can contain B tuples. This is the blocking factor, calculated as: B = \lfloor SB / SR \rfloor

Blocks can be organized in memory using different data structures.

Sequential Data Structures

The simplest structure is Sequential, where blocks are stored one after another.

There are two types:

For ordered files, if a block is full during insertion, a shift is required. If no space exists, overflow blocks (temporal storage) are used until reorganization is performed. Interval lookups on heaps require a full scan. On ordered files, the scan can stop once the sort condition is no longer met.

Hashing-Based Access Structures

Hashing uses a function to map a key to a bucket where the data is stored. The hash store has N_b buckets, each the size of a block. The hash function maps a key to a value in [0, N_b - 1], which is the bucket index.

When a bucket reaches max capacity (collision):

The average length of the overflow chain is \alpha = \frac{T}{B \times N_b}, where:

Tree-Based Structures

Tree structures (usually balanced) restrict the maximum depth of any path. The most common implementation is B+ Trees:

graph TD
  Root[Internal Node: 10, 20]
  Root --> L1[Leaf: 5, 10]
  Root --> L2[Leaf: 15, 20]
  Root --> L3[Leaf: 25, 30]
  L1 -.-> L2
  L2 -.-> L3

Indexes

Indexes are auxiliary data structures that speed up search based on specific fields. They usually store just the field value and a pointer to the actual tuple.

Classification by Density:

Classification by Key Type:

Query Optimization

When a DBMS receives a query, it transforms a declarative statement (SQL) into the most efficient sequence of physical operations. The process follows these stages:

  1. Analysis: Lexical, syntactic, and semantic validation (e.g., checking if tables exist).
  2. Translation: SQL is converted into an internal Relational Algebra Tree.
  3. Algebraic (Logical) Optimization: Applies heuristic rules to restructure the tree.
  4. Cost-Based (Physical) Optimization: Uses statistics to choose specific algorithms and join orders.
  5. Code Generation: Produces the executable plan for the storage engine.

Algebraic Optimization (Heuristics)

This phase uses heuristics to rewrite the query tree without calculating specific costs.

Cost-Based Optimization

The optimizer evaluates different “Physical Plans” and assigns a cost to each based on Disk I/O.

Statistics

To estimate costs, the DBMS maintains metadata about tables:

Selectivity & Result Estimation

Selectivity is the probability that a row satisfies a condition.

Sorting

Sorting is required for ORDER BY or DISTINCT. If the data doesn’t fit in RAM, the DBMS uses External Merge Sort.

Join Strategies

Joins are the most resource-intensive operations. The optimizer chooses based on data size and index availability:

Ranking

Queries are used for exact results, where all retrieved results are considered equal.

However, some results need to be ranked (distinct from mere sorting) based on some measure of quality or preferability.

Quality can be expressed on a spectrum of objectivity:

Ranking introduces a multi-objective optimization problem, where there are N objects described by M parameters, and the goal is to find the best k results.

Ranking can be performed using three main techniques:

Rank Aggregation

Rank aggregation is used to combine multiple ranked lists into a single consensus ranking. This is particularly useful when different voters or sources provide their own rankings of the same set of elements.

The main approaches are:

Borda Voting

Borda Voting assigns a score to each position in a ranked list. For example, in a list of n items, the top item receives n points, the second item receives n-1 points, and so on, down to the last item which receives 1 point.

The scores from all lists are summed for each item, and the final ranking is determined by sorting the items based on their total scores.

Condorcet Voting

Condorcet Voting compares each pair of items across all ranked lists. For each pair, the item that is ranked higher in more lists is considered the winner of that pairwise comparison.

The item that wins the most pairwise comparisons is declared the overall winner. If there is no clear winner (i.e., a cycle exists), additional methods may be needed to resolve the ranking.

Metric-Based Approaches

These approaches aim to find a ranking that minimizes the distance to all input rankings based on a specific metric.

The most common metrics are:

MedRank

MedRank is a rank aggregation method that constructs a consensus ranking by iteratively selecting the best choices from multiple ranked lists.

The process continues until k elements are selected that appear in at least half of the lists (median of the lists).

This algorith is instance-optimal, meaning it performs as well as any other algorithm for the specific instance of the problem.

Ranking (top-k)

The goal of top-k ranking is to identify the best k elements from a set of N candidates, based on a scoring function S that aggregates M different attributes or parameters.

A naive approach involves calculating the score for every single tuple, which is prohibitively expensive for large datasets.

Geometrical Interpretation

Top-k problems can be visualized geometrically by representing each element as a point in an M-dimensional space. The “best” elements are those that minimize the distance to an ideal point (e.g., the origin or a maximum value) or maximize a utility function.

This can be implemented using k-nearest neighbors. Common distance metrics include:

It’s possible to add a weight to each parameter to reflect its importance in the scoring function.

Data Access Types

When fetching data from multiple sources (e.g., different indexes or microservices), the DBMS typically uses two types of access:

Scoring Functions

Scoring functions usually rely on the property of Monotonicity: if an attribute value increases, the total score must not decrease. Common functions include:

B0 Algorithm

The B0 algorithm is a threshold-based approach specifically for the MAX (or MIN) scoring function.

  1. Perform k sorted accesses on each of the M lists.
  2. Collect all seen elements and calculate their total scores, even if partial.
  3. Sort the collected elements and return the top k.

Fagin’s Algorithm (FA)

Fagin’s algorithm works for any monotonic scoring function.

  1. Sorted Access Phase: Retrieve elements via sorted access from all M lists until at least k objects have been seen in every list.
  2. Random Access Phase: For every object seen during the first phase, perform random access to retrieve its missing attribute values.
  3. Calculation: Compute the final scores and return the top k.

Threshold Algorithm (TA)

TA is often more efficient than FA because it can stop much earlier.

  1. Perform sorted access on the M lists in parallel.
  2. For every new object seen, immediately use random access to fetch all its missing attributes and calculate its total score.
  3. Maintain a “Top-k” buffer of the best k objects found so far.
  4. Define a Threshold (\tau): \tau = S(\text{last\_seen}_1, \dots, \text{last\_seen}_M). This represents the maximum possible score any unseen object could achieve.
  5. Stopping Condition: Stop when the score of the k-th element in the buffer is better than \tau.

NRA Algorithm (No Random Access)

Used when random access is impossible or too expensive. It calculates bounds for each object:

  1. Perform sorted access on all lists.
  2. Update LB, UB, and \tau.
  3. Maintain a Top-k buffer based on LB.
  4. Stopping Condition: Stop when the k-th element in the buffer has a lower bound greater greater than the threshold or all the upper bound of the elements outside of the top-k LB[k] \ge \max{ \max{UB[i], i > k}, \tau}.

Skyline

The Skyline operator identifies “Pareto-optimal” objects. An object belongs to the Skyline if it is not dominated by any other object.

The Skyline is useful when a single scoring function cannot be defined, as it provides all potentially “best” candidates regardless of attribute weights.

Block-Nested Loop Skyline (BNL)

This approach uses a temporal buffer to store candidate Skyline objects.

  1. Initialize an empty Skyline Buffer in memory.
  2. For each object in the dataset:
    • Compare it with all objects in the Skyline Buffer.
    • If it is dominated by any object in the buffer, discard it.
    • If it dominates any objects in the buffer, remove those objects from the buffer.
    • If it is neither dominated nor dominates any object in the buffer, add it to the buffer.

Sort-Filter Skyline (SFS)

SFS improves upon BNL by sorting the dataset before processing. With this approach, objects that are in the skyline buffer will never be dominated by later objects.

  1. Sort all objects based on a monotonic function (e.g., one of the attributes).
  2. Initialize an empty Skyline Buffer.
  3. For each object in the sorted dataset:
    • Compare it with all objects in the Skyline Buffer.
    • If it is dominated by any object in the buffer, discard it.
    • If it is not dominated by any one, add it to the buffer.
Ultima modifica:
Scritto da: Andrea Lunghi